Database insertion and retrieval system and method

ABSTRACT

A database processing system and method for inserting into a database and retrieving from database documents formatted in accordance with a markup language.

The present invention relates generally to data processing, and, moreparticularly, to database processing for information provided in markuplanguage form.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system block diagram of a database insertion and retrievalsystem according to various embodiments;

FIG. 2 is a functional block diagram of the information storage andretrieval application according to various embodiments;

FIG. 3 is a database insertion functional flow diagram according tovarious embodiments;

FIG. 4 is an illustration of the mapping of attributes contained ininput information to identifiers or keys in a hash table index of outputinformation according to various embodiments;

FIG. 5 is a database retrieval functional flow diagram according tovarious embodiments;

FIGS. 6A and 6B are a flow chart of a database insertion methodaccording to various embodiments;

FIGS. 7A and 7B are a flow chart of a database retrieval methodaccording to various embodiments;

FIG. 8 is an example stylesheet defining attributes to be used for keysaccording to various embodiments;

FIGS. 9A and 9B are an example stylesheet used to obtain requested XMLaccording to various embodiments;

FIGS. 10A and 10B are an example stylesheet used to obtain requested XMLfor a particular locale according to various embodiments; and

FIGS. 11A and 11B are an example stylesheet used to obtain requested XMLfor a particular version according to various embodiments.

DETAILED DESCRIPTION

Embodiments are directed generally to a system and method for insertingdocument text into a database and for retrieving portions of thedocument text from that database. In particular, various embodiments cancomprise a system and methods for generating one or more keys fromselected attributes occurring in input information, and to insert outputinformation comprising the keys into a database.

With respect to FIG. 1, there is shown a database insertion andretrieval system 100 according to various embodiments. As shown in FIG.1, the database insertion and retrieval system 100 can comprise a server101 provided in communication with one or more client devices 102 usinga network 103. In various embodiments, the server 101 can comprise aninformation storage and retrieval application 105 provided incommunication with a database 107. The server 101 can further comprise acommunication interface configured to accomplish packet-basedcommunication using the network 103.

In various embodiments, the information storage and retrievalapplication 105 can comprise one or more servlets that includes asequence of programmed instructions that, when executed by a processorof the server 101, cause the server 101 to be configured to performdatabase insertion and retrieval functions as described herein.

The database 107 can comprise a memory manager 109 and a storage device111 provided in communication with the memory manager 109. In variousembodiments, the database 107 can store and retrieve information or datain response to one or more (Structured Query Language) SQL instructions.The storage device 111 can comprise a hard disk drive configured tostore information in accordance with SQL. Further, the memory manager109 can comprise a database manager that includes a local memory 112. Invarious embodiments, the memory manager 109 local memory 112 cancomprise a hash table index 113 and recently accessed databaseinformation from the storage device 111. In various embodiments, thelocal memory 112 of the memory manager 109 can have a faster access timelatency than the storage device 111. For example, the local memory 112can comprise a Random Access Memory (RAM) and the storage device 111 cancomprise a hard disk drive, in which case the local memory 112 can havean access time latency on the order of ten times faster than the storagedevice 111. In various embodiments, the local memory 112 can comprise afixed memory size specified by a target threshold size parameter. Thememory manager 109 can be configured to remove the oldest information inlocal memory 112 to provide capacity to store the transformedinformation and maintain the size of the local memory 112 below thetarget threshold size. The target threshold size and the frequency ofchecking whether or not the target threshold size has been exceeded caneach be configurable parameters controlled by the user.

The client device 102 can comprise a Personal Computer (PC) orworkstation including, but not limited to, a desktop PC, laptop PC,tablet PC, Personal Digital Assistant (PDA), cellular terminal orhandset, wireless terminal or handset, Internet appliance, or any othersuch device. In various embodiments, the client device 102 can comprisea communication interface configured to accomplish packet-basedcommunication using the network 103. For example, the client device 102can include a browser application such as Microsoft® Internet Explorer™available from Microsoft Corporation of Redmond, Wash., or MozillaFirefox™ available from the Mozilla Foundation of Mountain View, Calif.In various embodiments, the client device 102 can communicate with theserver 101 using the network 103 in accordance with the HyperTextTransfer Protocol (HTTP). For example, a user can establish a sessionwith the server 101 by entering the Uniform Resource Locator (URL)associated with the server 101 into an address field of the browserapplication. In various embodiments, the client device 102 can alsocomprise a standard set of hardware and software such as, but notlimited to, a processor, Read Only Memory (ROM), Random Access Memory(RAM), communication ports, user interface, operating system,application programs, as well as standard peripherals such as, but notlimited to, a data entry device such as a keyboard, a pointing andselection device such as a mouse or trackball, and a display. Theoperating system can be configured to support application programsconfigured to accept user input via the user interface in the form ofinteractive pages comprising static and dynamic display data and dataentry fields.

In various embodiments, the network 103 can comprise a packet-basednetwork configured to transfer packet-based information. For example,the network 103 can comprise an Internet Protocol (IP) based network inwhich information is transferred in accordance with the TransmissionControl Protocol (TCP)/IP standard such as, for example, the Internet.In various embodiments, the network 103 can comprise an intranet, awireless communication network such as Global System for MobileCommunications (GSM) or Code Division Multiple Access (CDMA), asatellite communication network, or a Local Area Network (LAN) orWireless LAN based on, for example, the IEEE 802.11 standard. Othervariations are possible. For example, the network 103 can also comprisea connection-based network such as, for example, the Public SwitchedTelephone Network (PSTN).

With respect to FIG. 2, there is shown a functional block diagram of theinformation storage and retrieval application 105 according to variousembodiments. As shown in FIG. 2, in various embodiments, the informationstorage and retrieval application 105 can comprise an input/outputportion 150, a translator portion 160, and a database interface portion170. In various embodiments, input/output portion 150, a translatorportion 160, and a database interface portion 170 can comprise one codeobject. In various alternative embodiments, each of the portions 150,160 and 170 can comprise multiple objects provided in communicationusing, for example, interprocess communication techniques.

In various embodiments, the input/output portion 150 can comprise asequence of Java™ instructions that configure the information storageand retrieval application 105 to input and output information inaccordance with the HyperText Transfer Protocol (HTTP). Otherembodiments are possible. For example, in various alternativeembodiments, the information storage and retrieval application 105 cancomprise one or more Common Gateway Interface (CGI) scripts.

Further, in various embodiments, the translator portion 160 can comprisea markup language translator configured to read input information andtranslate the input information into output information in accordancewith translation instructions. In various embodiments, the inputinformation and output information can be a text stream formatted inaccordance with the Extensible Markup Language (XML) markup language.Further, the markup language translator can be configured to performExtensible Style Language Transformation (XSLT) in accordance withtranslation instructions specified by one or more Extensible StyleLanguage (XSL) stylesheets 165. The translator portion 160 can acceptthe input information as an input file or as a document contained in aninput file. The translator portion 160 can provide the outputinformation as an output file. The translator portion 160 can thusoperate as an XSLT parser configured to translate a first XML documentinto a second XML document, for example. In various embodiments, thestylesheets 165 can be instantiated at time of application installation.In various embodiments, the stylesheets 165 are maintained innon-volatile storage of the server 101, but are not included in thedatabase 107.

In various embodiments, the database interface portion 170 can beconfigured to communicate with the database 107. For example, thedatabase interface portion 170 can be configured to generate and outputto the database 107 an information storage request or an informationretrieval request. The information storage and information retrievalrequests can be formatted in accordance with the Structured QueryLanguage (SQL). Database requests from the database interface portion170 can be received by the memory manager 109 of the database 107. Invarious embodiments, the database interface portion 170 can comprise aJava™ servlet.

In operation, in various embodiments, the translator portion 160 can beconfigured to receive input information and translate the inputinformation, in accordance with translation instructions specified byone or more stylesheets 165, into output information to be stored in thedatabase 107. In particular, the translator portion 160 can beconfigured to generate a key from an attribute occurring in the inputinformation, the input information being formatted in accordance with amarkup language. In various embodiments, the key can be an index keyused for retrieving the output information from the database 107. Adifferent key can be associated with each of many different types ofattributes. In various embodiments, the attributes in the inputinformation that are used by the translator portion 160 to generate thekeys can be defined in one or more stylesheets 165. The stylesheets 165can be customized to generate keys from a variety of attribute typesaccording to the needs of the user. FIG. 8 is an example stylesheet 165defining attributes to be used for keys according to variousembodiments.

Furthermore, stylesheets 165 can be used to specify to the translatorportion 160 the manner in which to add the keys to a hash table index.In various embodiments, the hash table index can comprise an internaldatabase index.

With respect to FIG. 3, there is shown a database insertion functionalflow diagram in accordance with various embodiments. As shown in FIG. 3,the translator portion 160 can receive the input information 301 andapply a first stylesheet 165 to generate keys based on occurrences ofthe attribute(s) specified in the first stylesheet 165. A key cancomprise an identifier that serves to identify the information, such asmarkup language data or a tag, associated with the correspondingattribute. In various embodiments, the translator portion 160 can beconfigured to generate one such identifier for every occurrence of thecorresponding attribute in the input information 301. Each suchgenerated identifier can be included in the output information 302.Thus, the output information 302 generated by the translator portion 160can comprise one or more of the identifiers, each of which eachidentifiers corresponds to an occurrence of the selected attribute(s) inthe input information 301, each of which identifiers identifies theinformation associated with the attribute in the input information 301,and each of which identifiers is added or inserted into the database107. In various embodiments, the output information 302 can comprisekeys in a hash table index. A second stylesheet 165 can be used tospecify to the translator portion 160 the manner in which to add thekeys to a hash table index. The hash table index can comprise aninternal database index.

In various embodiments, the database interface portion 170 can beconfigured to apply an insertion instruction page 303 to selectinsertion of the output information 302 into the database 107 as eithera single document or file, or as several compressed documents or files.The insertion instruction page 303 can comprise a markup language filesuch as, for example, a HyperText Markup Language (HTML) page. Thedatabase interface portion 170 can then upload the input information 301for insertion into the database 107. In various embodiments, thedatabase interface portion 170 can comprise a Java™ servlet. The inputinformation 301 can comprise XML formatted information. In variousembodiments, the input information 301 can be compressed using acompression algorithm such as, for example, the java.util.zip Java™compression utility of the Java™ 2 Platform Std. Ed. v 1.4.2 availablefrom Sun Microsystems of Santa Clara, Calif. In various alternativeembodiments, another ZIP compression algorithm can be used such as, forexample, PKZIP available from PKWARE, Inc. of Milwaukee, Wis., or theWinZip™ product available from Microsoft Corporation.

Furthermore, in various embodiments, the translator portion 160 can beconfigured to generate multiple levels of identifiers. Each level ofidentifiers can be hierarchically related to another one of the levels(for example, the immediately preceding level or the immediatelyfollowing level). In various embodiments, a top-level identifier canserve to identify an entire input information 301 file such as, forexample, an XML file. Multiple sub-level identifiers can be provided,wherein each sub-level identifier serves to identify any XML in theinput information 301 that meets the attribute criteria specified in theapplicable stylesheet 165. Further, the translator portion 160 can beconfigured to index all of the identifiers, or keys, by associating eachsub-level identifier with its immediately preceding (for example, nexthighest priority) sub-level identifier, and by associating eachsub-level identifier with its top-level identifier.

Example input information 301 is set forth in Table 1 below. As shown inTable 1, the input information 301 can comprise an XML file.

TABLE 1 Input Information <?xml version=‘1.0’ encoding=‘ISO-8859-1’ ?><task ID=“my.test” type=“merc”> <title><cdata>Tests mylinks</cdata></title> <objective><cdata>testing mylinks</cdata></objective> <subtask ID=“my.test.link”><title><cdata>linktest</cdata></title> <step> </step> </subtask><subtask ID=“my.test.run”> <title><cdata>run my test</cdata></title><step> </step> </subtask> </task>

Upon receiving the input information 301 shown in Table 1, thetranslator portion 160 can apply the first stylesheet 165 to generatethe identifiers. For example, if the stylesheet 165 specifies the “ID”attribute in the input information 301 to be used to generateidentifiers, the translator portion 160 can generate one identifier forevery occurrence of the “ID” attribute encountered in the inputinformation 301. Each generated identifier is included in the outputinformation 302. Thus, the output information 302 generated by thetranslator portion 160 can comprise one or more of the identifiers, eachof which each identifiers corresponds to an occurrence of the selectedattribute(s) in the input information 301, each of which identifiersidentifies the information associated with the attribute in the inputinformation 301, and each of which identifiers is added or inserted intothe database 107.

In various embodiments, the output information 302 can comprise keys ina hash table index. A second stylesheet 165 can be used to specify tothe translator portion 160 the manner in which to add the keys to a hashtable index. The hash table index can comprise an internal databaseindex. In various embodiments, the hash table index can be stored usingthe hash table 113 of the memory manager 109.

Example output information 302 is set forth in Table 2 below. As shownin Table 2, the output information 302 can comprise an XML file.

TABLE 2 Output Information ID = “my.test”, Top-level = “my.test” ID =“my.test.link”, Top-level = “my.test” ID = “my.test.run”, Top-level =“my.test”

With respect to FIG. 4, there is shown an illustrative mapping ofattributes contained in the input information 301 to identifiers or keysin the hash table index of the output information 302 for the exampleinput and output information of Tables 1 and 2 in accordance with thedatabase insertion process 300. As shown in FIG.4, the “ID” attribute isspecified by the stylesheet 165 for generating database identifiers, orkeys. Thus, the translator portion 160 generates multiple levels ofidentifiers for occurrences of the “ID” attribute in the inputinformation 301. In particular, the “ID” attribute for “my.test” isassigned as the top-level identifier, and the “ID” attributes for“my.test.link” and “my.test.run” are determined to be sub-levelidentifiers. As shown in FIG. 4, the sub-level identifiers for“my.test.link” and “my.test.run” are associated with the top-levelidentifier “my.test.” Therefore, the sub-level identifiers for“my.test.link” and “my.test.run” are hierarchically related to thetop-level identifier “my.test.” The top-level identifier “my.test”serves to identify the entire input information 301 file, while thesub-level identifiers serve to identify XML in the input information 301associated with the sub-level identifier. The hierarchically-relatedtop-level identifiers and sub-level identifiers shown in the outputinformation 302 of FIG. 4 can comprise a hash table index 113 useful forretrieving all or a portion of the input information 301 from thedatabase 107. Thus, the input information 301 can be inserted into thedatabase 107 by the database interface portion 170 in accordance withthe insertion instruction page 303 as described with respect to FIG. 3,for example, as a compressed file.

After insertion into the database 107, the inserted document text, forexample, markup language information of the input information 301, canbe retrieved from the database 107 using the hash table index (forexample, output data 302). With respect to FIG. 5, there is shown adatabase retrieval flow diagram in accordance with various embodiments.As shown in FIG. 5, upon receiving a database read request from theclient device 102, the input/output portion 150 can forward the databaseread request to the database interface portion 170. In variousembodiments, the client device 102 can submit a database read requestcomprising a specific identifier to be obtained from the database 107.For example, the database read request can comprise the identifier,“ID=‘my.test.link.’” It will be recalled from the previous example,“ID=‘my.test.link’” is a sub-level identifier that is hierarchicallyrelated to the top-level identifier ID=“my.test.” An example hash tableindex is shown in Table 3 below.

TABLE 3 Hash Table Index Key 1 ID = “my.test”, Top-level = “my.test” Key2 ID = “my.test.link”, Top-level = “my.test” Key 3 ID = “my.test.run”,Top-level = “my.test” Key 4 ID = “hello.world”, Top-level =“hello.world” Key 5 ID = “justin.time”, Top-level = “justin.time” Key 6ID = “outof.time”, Top-level = “justin.time”

Although six keys are shown in Table 3, it is to be understood that anynumber of keys can be included in the hash table index. The input/outputportion 160 can forward the database read request to the databaseinterface portion 170. Upon receiving the database read request, thedatabase interface portion 170 can search the keys in the hash tableindex 113, via table look-up or other method, for the identifiercontained in the database read request. For example, the databaseinterface portion 170 can perform a table lookup of the keys in the hashtable index 113 to determine that the second key in Table 3 correspondsto the specific identifier (“my.test.link”) contained in the exampledatabase read request. The database interface portion 170 can then forma database request using the sub-level identifier and top-levelidentifier located in the hash table index 113. The database interfaceportion 170 can then send the database request to the database 107.

In various embodiments, upon receiving the database request, the memorymanager 109 of the database 107 can determine if the informationcorresponding to the identifier is contained in local memory 112 at thememory manager 109. If so, then the memory manager 109 can return theinformation (for example, XML) associated with the identifier in thedatabase request to the database interface portion 170, without readingthe information from the storage device 111. Because the local memory112 has a faster access time latency than the storage device 111,storing information locally using the memory manager 109 reduces theaccess time to the client device 102 to obtain the requestedinformation.

If the requested information is not contained in memory manager 109local memory 112, then the memory manager 109 performs a database readoperation to obtain the requested information from the storage device111. The memory manager 109 also can add the information read from thestorage device 111 to a hash table contained in local memory 112, forfaster access to the information in response to subsequent requests forit. In various embodiments, the information obtained from the databasecan comprise the entire file or entire amount of information associatedwith the top-level identifier. For example, for the located key“ID=‘my.test.link’, Top-level=‘my.test’” will result in the database 107returning the entire file (for example, XML document) associated withthe “my.test” top-level identifier.

In various embodiments, upon receiving the information from the database107, the database interface portion 170 can forward the receivedinformation to the translator portion 160. The translator portion 160can apply a third stylesheet 165 parses the information received fromthe database to strip out unwanted information prior to presenting oroutputting the information to the client device 102. For example, forthe database access request comprising the sub-level identifier“my.test.link,” the translator portion 160 can remove all but thefollowing information as shown in Table 4:

TABLE 4 Transformed Database Output Information <subtaskID=“my.test.link”> <title><cdata>linktest</cdata></title> <step> </step></subtask>

In this case, for information flowing from the database to the clientdevice, the information obtained from the database 107 can compriseinformation input to the translator portion 160, and the transformedinformation provided to the client device can comprise informationoutput by the translator portion 160. The transformed database outputinformation can then be forwarded to the input/output portion 150 andtransferred to the client device 102 for further processing such as, forexample, display to a user.

Therefore, unlike other databases available for maintaining markuplanguage information, various embodiments comprising a system and methodfor inserting document text into a database and for retrieving portionsof the document text from that database as described herein can provide,among other things, improved speed and efficiency in indexing andsearching of information as well as improved speed of informationretrieval from a database, because only the desired data is transferredto the requesting device. Further, various embodiments can beimplemented using a relatively small number of instructions compared toother systems. While other databases use XPATH mechanisms to extractmarkup language from a database, various embodiments use unique keyscreated from attribute names to identify and obtain information from adatabase. In addition, various embodiments comprising the customizedstylesheets allow the user the capability to customize how informationis parsed into the database and also how information is displayed to theuser.

With respect to FIGS. 9A and 9B, there is shown an example thirdstylesheet 165 used to obtain the requested XML according to variousembodiments. In the example shown in FIGS. 9A and 9B, the stylesheet 165can cause the translator portion 160 to be configured to obtain the toplevel XML associated with a top-level identifier or subtask level XMLassociated with a child identifier. In various embodiments, the thirdstylesheet 165 can be a .xsl file.

With respect to FIGS. 10A and 10B, there is shown another example thirdstylesheet 165 according to various embodiments. In the example shown inFIGS. 10A and 10B, the stylesheet 165 can cause the translator portion160 to be configured to obtain the XML associated with a particularlanguage based on a chosen locale. For example, if information is storedin the database 107 in three different languages (such as, for example,English, French and German), the stylesheet 165 can cause the translatorportion 160 to obtain only the French version, if the user requested theFrench version and the French version is available. The XML for otherlocales is removed from the information provided to the requestingclient device 102.

With respect to FIGS. 11A and 11B, there is shown yet another examplethird stylesheet 165 according to various embodiments. In the exampleshown in FIGS. 11A and 11B, the stylesheet 165 can cause the translatorportion 160 to be configured to obtain the XML associated with aparticular item of equipment or version of equipment. For example, ifinformation is stored in the database 107 for different versions of adocument, the stylesheet 165 can cause the translator portion 160 toobtain the latest version.

In various embodiments, the stylesheets 165 of FIGS. 10A and 10B, and11A and 11B, can be applied after the stylesheet 165 of FIGS. 9A and 9Bobtains the appropriate XML. In various embodiments, for processing ofany stylesheet 165, elements encountered during translation that do notcontain the requested attributes, or that do not match, can be returnedto the requesting client device 102.

With respect to FIGS. 6A and 6B, there is shown a database insertionmethod 600 according to various embodiments. As shown in FIG. 6A, thedatabase insertion method 600 can commence at 601. The method canproceed to 603, at which the user selects a file for database insertion.The selection can be accomplished, for example, by entering a fileidentifier such as, for example, a file name, into a data entry field ofan interactive page at the client device 102. The interactive page cancomprise an HTML page, for example. The user can cause the client device102 to send the file to the database servlet (for example, theinformation storage and retrieval application 105) at the server 101 byactuating a button provided on the interactive page. Upon user actuationof the upload command or button, the client device 102 can transfer thefile to the servlet, at 605.

Control can then proceed to 607, at which the file for databaseinsertion can be received by the input/output portion 150 of thedatabase servlet. Upon recognizing a file for database insertion, theinput/output portion 150 can forward the file to the translator portion160. Control can then proceed to 609, at which, upon receiving the inputinformation (for example, the file for database insertion), thetranslator portion 160 can select the first stylesheet 165. In variousembodiments, the first stylesheet 165 can be retrieved from a memory ofthe server 101 or using the network 103. Control can then proceed to611, at which the translator portion 160 can apply the first stylesheet165 to the received input information to generate a key for eachoccurrence of one of the attributes to in the input informationspecified in the first stylesheet 165. In various embodiments, the keycan comprise one or more identifiers. Control can then proceed to 613,at which the translator portion can construct a hierarchy of relatedidentifiers as the keys are generated. In various embodiments, the keyscan comprise, for example, a first sub-level identifier and anotheridentifier that is the immediately preceding level identifier to whichthe first sub-level identifier belongs. Control can proceed to 615, atwhich the translator portion 160 can determine if the end of the inputinformation has been reached (for example, end of file). If not, thencontrol can return to 611 to search for the next attribute in the inputinformation selected by the first stylesheet 165, until keys have beengenerated for all matching attributes found in the input information.

Control can then proceed to 617, at which the translator portion 160 canselect the second stylesheet 165. In various embodiments, the secondstylesheet 165 can be retrieved from a memory of the server 101 or usingthe network 103. Referring to FIG. 6B, control can then proceed to 619,at which the translator portion 160 can generate the output information302 using the identifiers and keys determined at 611 and 613 inaccordance with the second stylesheet 165. In various embodiments, theoutput information 302 can comprise a hash table index 113. Control canthen proceed to 621, at which the translator portion 160 can store thehash table index 113 in memory manager 109 local memory 112.

Control can then proceed to 623, at which the database interface portion170 can retrieve the insertion instruction page 303 from the database107. The insertion instruction page 303 can comprise a markup languagefile such as, for example, a HyperText Markup Language (HTML) page.Control can then proceed to 625, at which the database interface portion170 can apply the insertion instruction page 303 to select the insertionmode for adding the input information 301 into the database 107. Controlcan proceed to 627, 629, or 631 for insertion of the input information301 into the database 107 in accordance with the insertion instructionpage 303. For example, at 627, the database interface portion 170 canformat the input information 301 for insertion into the database 107without using any compression. Alternatively, at 629, the databaseinterface portion 170 can format the input information 301 for insertioninto the database 107 by performing data compression of the inputinformation 301 as a single document. In various embodiments, the inputinformation 301 can be compressed using a compression algorithm such as,for example, the java.util.zip compression utility. Alternatively, at631, the database interface portion 170 can format the input information301 for insertion into the database 107 by performing data compressionof the input information 301 as multiple distinct files. For example, ifthe input information 301 is received as a single ZIP file, then thedatabase interface portion 170 can unzip the ZIP file and insertindividually each compressed file that is included in the ZIP file. Invarious embodiments, the database insertion portion 170 can beconfigured to insert the input information 301 into the database 107using the METHOD=“POST” HTML instruction.

Control can then proceed to 633, at which the database interface portion170 can store in, or upload to, the database 107, the input information301 from 629 or the compressed input information 301 from 631 or 633 aseither a single document or file, or as several compressed documents orfiles.

With respect to FIGS. 7A and 7B, there is shown a database retrievalmethod 700 according to various embodiments. As shown in FIG. 7A, thedatabase retrieval method 700 can commence at 701. The method canproceed to 703, at which the client device 102 prepares and sends adatabase read request to the server 101. In various embodiments, theclient device 102 can prepare and send the database read request inresponse to receiving a request for information from, for example, anapplication or in response to a user request received via userinterface. In various embodiments, the client device 102 can submit thedatabase read request comprising a specific identifier to be obtainedfrom the database 107. For example, the database read request cancomprise the sub-level identifier, “ID=‘my.test.link,’ or other specificidentifier to be obtained from the database 107.

The method can then proceed to 705, at which, at which the informationstorage & retrieval application (for example, database servlet) canreceive the database read request from the client device 102. Inparticular, upon receiving a database read request from the clientdevice 102, the input/output portion 150 can forward the database readrequest to the database interface portion 170. For example, the databaseread request can comprise the sub-level identifier, “ID=‘my.test.link.’The input/output portion 160 can forward the database read request tothe database interface portion 170.

Control can then proceed to 707, at which, upon receiving the databaseread request, the database interface portion 170 can search the keys inthe hash table index 113, via table look-up or other method, for theidentifier contained in the database read request. For example, thedatabase interface portion 170 can perform a table lookup of the keys inthe hash table index 113 to determine the key that corresponds to thespecific identifier contained in the database read request. Control canthen proceed to 709, at which the database interface portion 170 candetermine if the hash table index 113 contains keys matching thespecific identifier contained in the database read request. If not,control can proceed to 711, at which the database interface portion 170can send (via the input/output portion 150) an error message to theclient device 102 indicating no matching entry in the database 107. Invarious embodiments, the error message can comprise an HRTP responseindicating request failure.

If a key is located within the hash table index, then control can thenproceed to 713, at which the database interface portion 170 can form adatabase request using the sub-level identifier, if received, andtop-level identifier located in the hash table index 113, and then sendthe database request to the database 107.

Control can then proceed to 715, at which, upon receiving the databaserequest, the memory manager 109 of the database 107 can determine if theinformation corresponding to the identifier is contained in local memory112 at the memory manager 109. If so, then control can proceed to 717,at which the memory manager 109 can return the information (for example,XML) associated with the identifier in the database request to thedatabase interface portion 170, without reading the information from thestorage device 111. Because the local memory 112 has a faster accesstime latency than the storage device 111, storing information locallyusing the memory manager 109 reduces the access time to the clientdevice 102 to obtain the requested information.

If the requested information is not contained in memory manager 109local memory 112, then control can proceed to 719, at which the memorymanager 109 performs a database read operation to obtain the requestedinformation from the storage device 111. In various embodiments, theinformation obtained from the database storage device 111 can comprisethe entire file or entire amount of information associated with thetop-level identifier. For example, for the located key“ID=‘my.test.link’, Top-level=‘my.test’” will result in the database 107returning the entire file (for example, XML document) associated withthe “my.test” top-level identifier.

Control can then proceed to 721, at which, upon receiving theinformation from the database 107, the database interface portion 170can forward the received information to the translator portion 160 andthe translator portion 160 can apply a third stylesheet 165 parses theinformation received from the database to strip out unwanted informationprior to presenting or outputting the information to the client device102, such that the transformed information returned to the client device102 is only the information associated with the selected sub-levelidentifier, and not the remaining information in the document stored inthe database. Therefore, only the information needed by the clientdevice 102 is actually transferred to the client device 102, resultingin more efficient and timely responses to database requests. In variousembodiments, the translator portion 160 can be configured to perform anXSL translation that results in only pertinent data being obtained. Forexample, the translator portion 160 can be configured to extractinformation by identifier and by attributes passed to the database 107.Values that do not agree with the attributes can be removed. Elementsthat do not contain the attributes or match can be passed back to theclient.

Referring to FIG. 7B, control can then proceed to 723, at which thememory manager 109 also can add the transformed information to the hashtable in local memory 112, for faster access to the information inresponse to subsequent requests for it. Upon adding the transformedinformation to the hash table, control can then proceed to 725, at whichthe memory manager 109 can determine whether or not looping timeframe ismet. For example, the memory manager 109 can maintain a counter that isincremented each time information is added to the hash table. Upon thecounter reaching a predetermined number, for example, a parameterspecifying the number of iterations or “loops” to occur before settingthe looping timeframe to an active state, then control can proceed to727, at which the memory manager 109 can determine if the local memory112 size has exceeded a target threshold size. If the looping timeframeis not set (for example, the number of iterations has not yet beenreached), then control can proceed to 731.

If at 727 the memory determines that the local memory 112 size hasexceeded the target threshold size, then control can proceed to 729, atwhich the memory manager 109 can remove the oldest information in localmemory 112 to provide capacity to store the transformed information andmaintain the size of the local memory 112 below the target thresholdsize. In various embodiments, the target threshold is configurable andcan be modified by, for example, updating an input parameter specifyingthe target size threshold contained in a configuration file.

Control can then proceed to 731, at which the input/output portion 150can send the transformed database information to the client device 102for further processing such as, for example, display to a user. Invarious embodiments, the transformed information obtained from thedatabase can be output to the client device 102 as an HTTP response.Control can then proceed to 733, at which the method can end.

Thus has been disclosed a system and method for inserting document textinto a database and for retrieving portions of the document text fromthat database. The system and method can provide, among other things,improved speed and efficiency in indexing and searching of informationas well as improved speed of information retrieval from a database,because only the desired data is transferred to the requesting device.

Various embodiments can be implemented using hardware and softwarecomponents including the PC and related peripherals as described herein.However, it is further apparent to those skilled in the art that thedisclosed system may be readily implemented in software using object orobject-oriented software development environments that provide portablesource code that can be used on a variety of computer platforms.Alternatively, the disclosed system may be implemented partially orfully in hardware using standard logic circuits or a VLSI design. Otherhardware or software can be used to implement the systems in accordancewith this invention depending on the speed and/or efficiencyrequirements of the systems, the particular function, and/or aparticular software or hardware system, microprocessor, or microcomputersystem being utilized. The system and method herein can be readilyimplemented in hardware and/or software using any known or laterdeveloped systems or structures, devices and/or software by those ofordinary skill in the applicable art from the functional descriptionprovided herein and with a general basic knowledge of the computer andmark-up language arts.

Moreover, the disclosed methods may be readily implemented in softwareexecuted on programmed general-purpose computer, a special purposecomputer, a microprocessor, or the like. In these instances, the systemsand methods of this invention can be implemented as program embedded onpersonal computer such as Java™ or CGI script, as a resource residing ona server or graphics workstation, as a routine embedded in a dedicatedencoding/decoding system, or the like. The system can also beimplemented by physically incorporating the system and method into asoftware and/or hardware system, such as the hardware and softwaresystems of an image processor.

While embodiments of the invention have been described above, it isevident that many alternatives, modifications and variations will beapparent to those skilled in the applicable arts. Accordingly, theembodiments of the invention, as set forth above, are intended to beillustrative, and should not be construed as limitations on the scope ofthe invention. Various changes may be made without departing from thespirit and scope of the invention. Accordingly, the scope of the presentinvention should be determined not by the embodiments illustrated above,but by the claims appended hereto and their legal equivalents.

1. A method for processing information from a document using a database, comprising the steps of: associating a tag with one of a plurality of identifiers; generating a hash table comprising a plurality of levels, wherein each one of said plurality of levels is hierarchically related to another one of said plurality of levels, and wherein each one of said plurality of levels is associated with one of said plurality of identifiers; receiving the document comprising text formatted in accordance with a first markup language; determining each occurrence of one of the plurality of identifiers within the text by searching the text for a text stream that matches the identifier using a stylesheet; and generating a hash table index comprising at least one key, wherein the at least one key comprises at least one of the plurality of identifiers.
 2. The method of claim 1, further comprising: retrieving the tag associated with one of the plurality of identifiers at one or more of the plurality of levels.
 3. The method of claim 2, wherein retrieving the tag further comprises: retrieving the tag from a local memory of the database if the tag is stored therein; and retrieving the tag from a hard disk of the database if the tag is not stored in the local memory.
 4. The method of claim 1, wherein the tag comprises text formatted in accordance with the first markup language.
 5. The method of claim 4, wherein the first markup language is extensible markup language (XML).
 6. The method of claim 5, further comprising: selecting the at least one identifier; and modifying the stylesheet to use the at least one selected identifier.
 7. The method of claim 5, wherein the plurality of levels comprises one or more top levels and at least one sublevel associated with each one of the top levels.
 8. The method of claim 7, further comprising: associating the document with one of the top levels.
 9. The method of claim 1, further comprising: performing data compression on the document text to form a compressed document.
 10. The method of claim 9, further comprising: inserting the document text into the local memory and the hard disk of the database according to an insertion page comprising database insertion instructions provided in accordance with a second markup language.
 11. The method of claim 10, further comprising: determining whether or not the local memory size has exceeded a target threshold size; and removing, if the target threshold size has been exceeded, the oldest information from the local memory to maintain the local memory size below the target threshold size.
 12. A system for processing information using a database, comprising: an information storage and retrieval application configured to receive markup language information and database requests from a client device and further comprising a translator portion configured to generate a key based on each occurrence of a selected attribute occurring in a file, the selected attribute being specified using a first stylesheet; and a database coupled to the information storage and retrieval application and further comprising a memory manager.
 13. The system of claim 12, wherein the memory manager further comprises: a local memory including a hash table index and a hash table; wherein the translator portion is configured to form the key using at least one identifier associated with the selected attribute and to add one or more keys to the hash table index in accordance with a second stylesheet.
 14. The system of claim 13, wherein the at least one identifier comprises a top-level identifier and at least one sub-level identifier, and wherein the at least one sub-level identifier and the top-level identifier are hierarchically related.
 15. The system of claim 14, wherein the top-level identifier is associated with a document comprising input information, and wherein each of the at least one sub-level identifiers is identified with a portion of the input information.
 16. The system of claim 13, wherein the translator portion is further configured to insert input information into the database in accordance with an insertion instruction page, and to transform information received from the database in accordance with a third stylesheet.
 17. A computer-readable medium upon which is embodied a sequence of programmable instructions which when executed by a processor cause the processor to perform functions comprising: receiving a document comprising text formatted in accordance with a first markup language; associating a tag with one of a plurality of identifiers, wherein the tag comprises document text; generating a hash table comprising a plurality of levels, wherein each one of said plurality of levels is hierarchically related to another one of said plurality of levels, and wherein each one of said plurality of levels is associated with one of said plurality of identifiers; determining each occurrence of one of the plurality of identifiers within the text by searching the text for a text stream that matches the identifier using a stylesheet; generating a hash table index comprising at least one key, wherein the at least one key comprises at least one of the plurality of identifiers; retrieving the tag associated with one of the plurality of identifiers at one or more of the plurality of levels, further comprising retrieving the tag from a local memory of a database if the tag is stored therein and retrieving the tag from a hard disk of the database if the tag is not stored in the local memory; associating the document with one of the top levels; performing data compression on the document text to form compressed document text; and inserting the compressed document text into the local memory and the hard disk of the database according to an insertion page comprising database insertion instructions provided in accordance with a second markup language.
 18. The computer-readable medium of claim 17, wherein the instructions further comprise: determining whether or not the local memory size has exceeded a target threshold size; and removing, if the target threshold size has been exceeded, the oldest information from the local memory to maintain the local memory size below the target threshold size.
 19. The computer-readable medium of claim 17, wherein the plurality of levels comprises one or more top levels and at least one sublevel associated with each one of the top levels.
 20. The computer-readable medium of claim 17, wherein the tag comprises text formatted in accordance with the first markup language, and wherein the first markup language is extensible markup language (XML). 